Phylogenetic study of the evolution of PEP-carboxykinase.

Phosphoenolpyruvate carboxykinase (PCK) is the key enzyme to initiate the gluconeogenic pathway in vertebrates, yeast, plants and most bacteria. Nucleotide specificity divided all PCKs into two groups. All the eukaryotic mammalian and most archaeal PCKs are GTP-specific. Bacterial and fungal PCKs can be ATP-or GTP-specific but all plant PCKs are ATP-specific. Amino acid sequence alignment of PCK enzymes shows that the nucleotide binding sites are somewhat conserved within each class with few exceptions that do not have any clear ATP- or GTP-specific binding motif. Although the active site residues are mostly conserved in all PCKs, not much significant sequence homology persists between ATP- and GTP-dependent PCK enzymes. There is only one planctomycetes PCK enzyme (from Cadidatus Kuenenia stuttgartiensis) that shows sequence homology with both ATP-and GTP-dependent PCKs. Phylogenetic studies have been performed to understand the evolutionary relationship of various PCKs from different sources. Based on this study a flowchart of the evolution of PCK has been proposed.


Introduction
Phosphoenolpyruvate carboxykinase (PCK), a carboxylase enzyme in nature, (EC 4.1.1.32 (GTPdependent) or EC 1.1.49 (ATP-dependent)), is present in all known groups of living organisms. It catalyzes metal-nucleotide coupled reversible decarboxylation and phosphorylation between phosphoenolpyruvate (PEP) and oxaloacetate (OAA) depending on the system and the availability of the intermediates. In vertebrates, fungi, plants and in most bacteria, production of PEP from OAA by PCK is the key step for gluconeogenesis to produce glucose during fasting. In humans, increased gluoconeogenesis is responsible for the high blood glucose level in non-insulin-dependent diabetes mellitus (NIDDM) patients. Otherwise, in healthy people cytosolic PCK enzyme is only present during glucose starvation; cytosolic PCK rapidly disappears on the resupplying of glucose due to hormonal control of the transcription of the cytosolic PCK-gene. In some bacteria such as Anaerobiospirillum succiniciproducens (Cotelesage et al. 2005), parasitic helminthes like Ascaris suum (Rohrer et al. 1986), nematodes such as Haemonchus contortus (Klein et al. 1992), PCK carry out the reverse reaction to produce OAA from PEP. In kinetoplastid parasites, such as Trypanosoma cruzi (Trapani et al. 2001) and all species of the genus Leishamania, this enzyme is very active even in the presence of high levels of carbohydrate, producing a mixture of CO 2 , succinate and alanine as end products.
PCKs can be divided into two groups, based on its specifi city towards the nucleotide substrate: ATPdependent PCKs are mainly present in bacteria, yeast and plants, GTP-specifi c PCKs are mostly present in higher eukaryotes, most archaeons as well as in some bacteria (Fukuda et al. 2004). Fungal PCKs can be either ATP-or GTP-dependent. While there is signifi cant sequence homology among each class, no statistically signifi cance homology is found between the PCKs of the two classes. The structural studies from six different structures of PCKs, solved so far, demonstrated that the metal-binding and oxaloacetate binding active site residues are conserved in both ATP-and GTP-dependent PCKs (Holyoak and Nowak, 2006;Cotelesage et al. 2005;Dunten et al. 2002;Trapani et al. 2001;Matte et al. 1996;Sudom et al. 2001;Leduc et al. 2005). The nucleotide binding motifs are also almost conserved within each category but are unique for each class with few exceptions. The most archeon PCKs and Giardia intestinalis PCK have a unique GTP-binding region, compared to the GTP-binding motif present in other GTP-specifi c PCKs (Fukuda et al. 2004). It is interesting to know how and why two types of nucleotide specifi cities evolved.
The universal tree of evolution based on ribosomal RNA separates the three domains of archaea, bacteria and eukarya and places extreme thermophiles at the base of the bacteria (Brown et al. 2001). Another approach to make the universal tree based on protein sequence produces lots of mixing between domains. In spite of intermixing of the domains based on protein sequence homology, this method might be useful to understand the branching more accurately, as proteins control the cellular processes to maintain life. The three dimensional structure of a protein in its active form can provide important clues about how the protein performs its function. However structure determination of the protein is not always easy. Sequence alignment of similar proteins in distantly related organisms can also provide us with evolutionary relationships. Recent progress by genomic sequencing projects from a wide variety of species allows us to resolve more robust phylogenetic relationships among species.
Alignment of all PCK enzymes from the NCBI data bank shows some interesting results. In this paper we made the phylogenetic trees based on number of PCK proteins and genes sequences available in NCBI until 2007. To simplify the fi gure we only present a few selective species from all three domains of life. Extensive sequence alignment shows that the active site amino acid residues and metal binding sites (kinase 1a and kinase 2 regions) are almost but not completely conserved in PCKs of any origin and some have different nucleotide-binding sequences, which do not have any obvious specifi c nucleotide-binding motif for ATP or GTP. Based on protein sequence similarity and nucleotide specifi cities we propose a very simple evolutionary fl owchart for PCK.

Phylogenetic Analysis
Initial sequence alignment of all the species containing PCK proteins and genes were performed using ClustalW (Higgins et al. 1994). Prodist and Dnadist software programs under PHYLIP (Felsenstein, 1989) were used on the aligned sequences which produce a distance matrix fi le. Performing Neighbor-Joing and UGMA (unweighted mean) on the distance matrix fi le and lastly by employing Drawtree we produced an unrooted tree diagram for all PCKs to give the evolutionary relationship between the species (Felsenstein, 1989). The phylogenetic tree (Fig. 1) based on the enzyme sequences of PCKs, nicely divided ATP-and GTP-dependent PCKs into two regions. In each region of the tree, the PHYLIP program also grouped different species in the same branch or in proximity with very few exceptions, which are labeled. The archaeal Aeropyrum pernix (A-per) is placed close to other ATP-dependent bacterial PCKs, although it does not contain the conserved ATP-binding pockets (RX 5 TR) of bacterial or eukaryotic PCKs. At the NCBI data bank Aeropyrum pernix (A-per) (NCBI accession no: Q9YG68) is the only ATP-dependent PCK from archaea. A-per PCK shows less than 10% sequence homology with all other archaeal PCKs, which are GTPspecifi c but it shares more than 25% sequence homology (by bl2seq) with other ATP-specifi c PCKs from bacterial or eukaryotic origin (Table 1) (Tatusova and Madden, 1999). Another bacterial PCK from the group planctomycetes Candidatus Kuenenia stuttgartiensis (C-stu) (NCBI accession no: CAJ75104) is placed close to A-per PCK. Cstu PCK also does not have a conserved ATP-binding pocket. These two PCKs might have a conserved arginine in the ATP-binding pocket as shown in the active site alignment table (Table 2). Both of these PCKs from A-per and C-stu have 51% similar (positive) and 31% identical aminoacid sequences. C-stu PCK has 23%-36% amino acid sequence identity (by bl2seq) with various ATP-and GTP-specifi c PCKs from bacterial and eukaryotic species (Table 1). C-stu PCK does not have sequence similarity with other archaeal PCKs (except A-per) and with a few eukaryotic PCKs. A PCK from the archaon Thermoplasma volcanium (T-vol) (NCBI accession no: P58306), which contains a well conserved PCK-binding pocket like other GTP-dependent bacterial and eukaryotic species, also shows sequence similarity (23% identity, and 40% similarity) with C-stu PCK. The PCK from T-vol shows sequence similarity and identity with all other archaeal GTP-PCKs as well as all other GTP-dependent PCKs (Table 1). Another two GTP-dependent PCKs such as Drosophila melanogaster (NCBI accession no: P20007) and Chasmagnathus granulate (NCBI accession no: AAL78163) also have enzyme sequence homology with few other GTP-specifi c PCK enzymes similar to that of T-vol (Table 1).
In archea there are two distinct types of GTPbinding pocket; in one group they have a conserved GTP-binding pocket like other bacterial and eukaryotic PCKs (F/YXXXF/Y), while in the other few archea and in Giardia intestinalis (NCBI accession no: AAG47713), PCKs have distinct GTP-binding residues (Table 2). These archaea and Giardia intestinalis, PCKs also show more than 23% sequence identity with T-vol PCK.
In other two PCKs from Lactobacillus casei (NCBI accession no: ZP_00386357) and Streptococcus bovis ((NCBI accession no: BAE46992) the amino acid sequence of the ATP-binding pocket is not totally conserved (Table 2).
PCKs from Nephrops norvegicus (NCBI accession no: CAB65311) and Litopenaeus vanamei (NCBI accession no: CAB85964) do not have the lysine for the kinase 1a/P-loop (XKT) which is conserved in all other PCKs, but the threonine is conserved (Table 2). Alignment (Table 2) shows that in Nephrops norvegicus, the conserved glycine is also absent. In Lactobacillus casei and Streptococcus bovis, between lysine and threonine there is one serine residue. The conserved asparagine residue in this P-loop for GTP-specifi c PCKs is replaced by threonine in ATP-dependent PCKs. In the GTP-dependent PCK with a unique GTPspecifi c pocket, which is not conserved in other bacterial and eukaryotic PCKs, this asparagine is replaced by a serine residue. Similarly alignment studies also show that A-per PCK does not have the second lysine in its PCK-specific domain (XKK). The kinase 2 (XDD) domain is conserved in all PCKs where X in XDD is also glycine in most cases, with few exceptions where it is replaced by serine/histidine/glutamine. Initially we aligned all the PCK sequences available at NCBI and made the phylogenetic tree; later we chose a few from each domain group and the PCKs which have some importance (Table 2)  in the fi gure. The reactive cysteine residue (Cys 288 for human) which is conserved in most of the GTP-specifi c PCKs is replaced by serine in T-vol and Thermoplasma acidophilum (T-aci) PCKs (NCBI accession no: Q9HLV2) PCK. The phylogenetic tree based on PCK genes (Fig. 2) is partly similar to the tree obtained from PCK enzymes with few exceptions. A-per and C-stu PCKs are in the same branch. But D-mel, a GTP-dependent PCK is placed more closely to ATP-dependent PCKs. Also L-cas and S-bov, two ATP-dependent PCKs, were nicely placed in the same branch but more closely to the GTP-specifi c PCKs. Otherwise the program grouped all other similar PCKs closer just like other phylogenetic trees based on PCK enzymes. The score resulted from ClustalW (Table 3) for PCK genes, representing the identity between two, is very inconclusive. The value is 10 between A-per and C-stu otherwise all values are less than 10 for these two PCK genes compared to other PCK genes. For D-mel PCK, the gene score value is also very low for all other PCKs. Other two GTPdependent PCKs C-gra and T-vol show some high scores only with other GTP-dependent PCKs.

Discussion Based on Phylogenetic Results and Sequence Alignments
The sequence alignment of all the PCKs demonstrate the presence of a PCK-specific domain (XKK), kinase 1a/P-loop (XKT), kinase 2 region (GDD) and nucleotide-specifi c pockets ( Table 2). The catalytic site consists of the PCK-specifi c domain, the kinase 1a and kinase 2 regions which are almost conserved in ATP-and GTP-dependent PCKs from all organisms with very few exceptions as shown in the sequence alignment (Table 2). Sequence alignment of all PCKs available at NCBI shows us a few interesting features of some PCKs. The structural analysis from X-ray diffraction studies shows that the second lysine in the PCK-specifi c domain binds with divalent metal ion in the second metal binding site. The absence of the conserved second lysine residue in A-per PCK might eliminate the second metal ion binding site. The lysine residue in GKT/P-loop (Lys 254 in E. coli) is interacting with both the βand γ-phosphoryl groups of ATP. The positive charge of lysine helps in stabilizing the phosphoryl group during transfer (Matte et al. 2001). The absence of this lysine in N-nor and L-van might affect the reactivity of these enzymes. In this P-loop there is a reactive cysteine which is mostly conserved in Table 2. Alignment of the active site residues including the PCK-specifi c region, Kinase 1a/P-loop, kinase-2 region and nucleotide binding sites of few selected PCKs. Active site residues are shown in red, mismatches are in blue.

PCK-specifi c Kinase-1a Kinase-2 Nucleotide binding domain (P-loop) region
A_suum all GTP-dependent PCKs. Reactive cystine, however, is replaced by serine in T-vol and T-aci which are two archeal PCKs with well defi ned GTP binding motif. It might be interesting to know how the absence of this cysteine can affect the stability of the P-loop in the GTP-dependent PCKs in T-vol and T-aci. Although the phylogenetic tree ( Fig. 1) constructed from selected important PCK enzymes is very similar to the tree made by Fukuda et al. where the emphasis was given on the position of archeal PCKs (Fukuda et al. 2004), the current phylogenetic tree is more extensive and elaborate. We have included few more PCKs from other species to explain the evolutionary origin of PCK. The tree reported here also shows that ATP-and GTPdependent PCKs are present in all three major domains of life. All mammalian PCKs are GTP dependent, plant PCKs are ATP dependent and bacterial and fungal PCKs can be ATP-or GTPdependent. It is interesting to note that parasitic nematodes, like Ascaris suum and Haemonchus contortus where PCKs preferentially carboxylate PEP to OAA, have been placed in a separate branch in the tree.
Many scientists hypothesize that the Archaea are the closest modern relatives of earth's fi rst living cells. They are called universal ancestors from which all other life is believed to have evolved (Woese, 2000). All archaeal PCKs are GTP-dependent except for Aeropyrum pernix (A-per). From amino acid sequence similarity, A-per PCK is closer to the ATP-binding PCKs (Table 1). It does not have any sequence similarity with other archaeal PCKs, which justifi es its position in the phylogenetic tree (Fig. 1).
The most interesting PCK is bacterial planctomycetes Candidatus Kuenenia stuttgartiensis (C-stu) which placed close to A-per PCK. ATP-and GTP-specifi c PCKs have less than 10% sequence similarities, although C-stu PCK shows a very unique feature (Table 1). C-stu PCK has 23% -36% amino acid sequence identity (by blast2) with various ATP-and GTP-specifi c PCKs from bacterial and eukaryotic origin (Table 1). This is the only PCK, so far, which has sequence similarity with both ATP-or GTP-binding PCKs but does not have any well defi ned nucleotide-binding motif. But still there is another valid point. C-stu PCK does not have sequence similarity with most other archaeal and few other GTP-dependent PCKs with only one exception (T-vol). The sequence homology between the archaeal PCK from T-vol and C-stu PCK might gave us the missing link. Sequence homology analysis also shows that other two GTP-specifi c PCKs, from D-mel and C-gra, have similar sequence homology with other GTP-specifi c PCKs and also with C-stu PCK but with any ATP-dependent PCKs. We can place C-stu PCK, which has sequence similarity with both ATP-and GTP-dependent PCKs) and does not have a very well recognized nucleotide-binding pocket, at the root (Table 2). Does it mean initially that PCKs did not have any ATP-or GTPspecifi cities, which have been acquired gradually with evolution? The position of plantomycetes is also very controversial in the phylogenetic tree. One recent analysis placed planctomycetes at the deep branching position near the root (Fuerst, 1995). Based on this concept and our homology studies we propose a very simplifi ed evolution fl owchart for PCK (Table 4). It would be interesting to know the active site residues and the tertiary structures of C-stu, T-vol and A-per PCKs to complement this evolutionary perspective.